Apache Pig
Apache Pig is a software framework which offers a run-time environment for execution of MapReduce jobs on a Hadoop Cluster via a high-level scripting language called Pig Latin. The following are a few highlights of this project:
- Pig is an abstraction (high level programming language) on top of a Hadoop cluster.
- Pig Latin queries/commands are compiled into one or more MapReduce jobs and then executed on a Hadoop cluster.
- Just like a real pig can eat almost anything, Apache Pig can operate on almost any kind of data.
- Hadoop offers a shell called Grunt Shell for executing Pig commands.
- DUMP and STORE are two of the most common commands in Pig. DUMP displays the results to screen and STORE stores the results to HDFS.
- Pig offers various built-in operators, functions and other constructs for performing many common operations.
Additional Information: Home Page | Wiki | Documentation/User Guide/Reference Manual | Mailing Lists